An Efficient Algorithm for Chinese Postman Walk on Bi-directed de Bruijn Graphs

نویسندگان

  • Vamsi Kundeti
  • Sanguthevar Rajasekaran
  • Hieu Dinh
چکیده

Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However, finding a shortest double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying unweighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2 log2(|V|)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a Θ(p(|V| + |E|) log(|V|) + (dmaxp)3) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{υ∣din(υ) - dout(υ) > 0}|, |{υ∣din(υ) - dout(υ) < 0}|} and dmax = max{∣din(υ) - dout(υ)}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V| lies between 0.08% and 0.13% with 95% probability. Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. A Θ(p(|V| + |E|)) time heuristic algorithm based on these ideas has been implemented for the SDDNA problem. This algorithm was tested on short reads from a plant genome and achieves an approximation ratio of at most 1.0134. We also present a Θ((|V| + |E|) log(V)) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Euler and Chinese Postman Problems on 2-Arc-Colored Digraphs

The famous Chinese Postman Problem (CPP) is polynomial time solvable on both undirected and directed graphs. Gutin et al. [Discrete Applied Math 217 (2016)] generalized these results by proving that CPP on c-edge-colored graphs is polynomial time solvable for every c ≥ 2. In CPP on weighted edge-colored graphs G, we wish to find a minimum weight properly colored closed walk containing all edges...

متن کامل

Computability of Models for Sequence Assembly

Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two differe...

متن کامل

Chinese Postman Problem on Edge-Colored Multigraphs

It is well-known that the Chinese Postman Problem on undirected and directed graphs is polynomial-time solvable. We extend this result to edge-colored multigraphs. Our result is in sharp contrast to the Chinese Postman Problem on mixed graphs, i.e., graphs with directed and undirected edges, for which the problem is NP-hard.

متن کامل

The Chinese Postman Problem in Regular Graphs of Odd Degree

The Chinese Postman Problem in a graph is the problem of finding a shortest closed walk traversing all the edges. In a (2r + 1)-regular graph, the problem is equivalent to finding a smallest spanning subgraph in which all vertices have odd degree. We establish a sharp upper bound for the solution in 3-regular graphs, characterize when equality holds, and conjecture the answer for general r.

متن کامل

Approximating the length of Chinese postman tours

This article develops simple and easy-to-use approximation formulae for the length of a Chinese Postman Problem (CPP) optimal tour on directed and undirected strongly connected planar graphs as a function of the number of nodes and the number of arcs for graphs whose nodes are randomly distributed on a unit square area. These approximations, obtained from a multi-linear regression analysis, all...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Discrete mathematics, algorithms, and applications

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2010